An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction?

نویسندگان

  • Jacob D. Abernethy
  • Alexander Rakhlin
چکیده

Consider a sequence of examples (xt, yt) for t = 1, . . . , T where xt ∈ R and yt ∈ [K], where the goal of a Learner is to predict the class yt from the input xt. In the more common full-information setting, the Learner observes the true class yt after making her prediction ŷt. In the present open problem, however, we will consider the so-called bandit setting: after predicting ŷt, the Learner is only told “correct” or “incorrect”, her feedback being a single bit 1[ŷt 6= yt]. We assume that the Learner’s hypothesis class is the set ofK-tuples of vectorsW = 〈w1, . . . ,wK〉 where wi ∈ R (we can think ofW as theK × n hypothesis matrix). Given an instance xt, such a hypothesis will produce a K-tuple of “scores” 〈w1 ·xt, . . . ,wK ·xt〉, and the Learner’s prediction will be the class with the largest score: ŷt = arg max k∈[K] wk · xt.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret

We present an efficient second-order algorithm with Õ( 1 η √ T ) regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η, for a range of η restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η = 0) to squared hinge loss (η = 1). This provides a solution to the ...

متن کامل

Efficient Online Bandit Multiclass Learning with Õ(√T) Regret

We present an efficient second-order algorithm with Õ( 1 η √ T )1 regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by η, for a range of η restricted by the norm of the competitor. The family of loss functions ranges from hinge loss (η = 0) to squared hinge loss (η = 1). This provides a solution to the...

متن کامل

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in [AR09], which is parameterized by a scalar α. We prove that the regret of NEWTRON is O(log T ) when α is a constant that does not vary with horizon T , and at mostO(T ) if α is allowed to increase t...

متن کامل

An Efficient Bandit Algorithm for √ T - Regret in Online Multiclass Prediction ?

Consider a sequence of examples (xt, yt) for t = 1, . . . , T where xt ∈ R and yt ∈ [K], where the goal of a Learner is to predict the class yt from the input xt. In the more common full-information setting, the Learner observes the true class yt after making her prediction ŷt. In the present open problem, however, we will consider the so-called bandit setting: after predicting ŷt, the Learner ...

متن کامل

Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization

We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O∗( √ T ) regret. The setting is a natural generalization of the nonstochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We show how the difficulties encountered by...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009